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Abstract 

We consider the problem of sparse estimation via a lasso-type penalized like- 
lihood procedure in a factor analysis model. Typically, the model estimation is 
done under the assumption that the common factors are orthogonal (uncorrelated). 
However, the lasso-type penalization method based on the orthogonal model can 
often estimate a completely different model from that with the true factor structure 
when the common factors are correlated. In order to overcome this problem, we 
propose to incorporate a factor correlation into the model, and estimate the factor 
correlation along with parameters included in the orthogonal model by maximum 
penalized likelihood procedure. An entire solution path is computed by the EM 
algorithm with coordinate descent, which permits the application to a wide variety 
of convex and nonconvex penalties. The proposed method can provide sufficiently 
sparse solutions, and be applied to the data where the number of variables is larger 
than the number of observations. Monte Carlo simulations are conducted to in- 
vestigate the effectiveness of our modeling strategies. The results show that the 
lasso-type penalization based on the orthogonal model cannot often approximate 
the true factor structure, whereas our approach performs well in various situations. 
The usefulness of the proposed procedure is also illustrated through the analysis of 
real data. 

Key Words: Nonconvex penalty, Oblique structure. Rotation technique. Penalized like- 
lihood factor analysis 



1 Introduction 

Factor analysis provides a practical tool for exploring the covariance structure among 
a set of observed random variables by construction of a smaller number of random vari- 
ables called common factors. In exploratory factor analysis, a traditional estimation 
procedure in use is the following two-step approach: the model is estimated by the maxi- 
mum likelihood method under the assumption that the common factors are uncorrelated 



(orthogonal), and then rotation techniques, such as the varimax method (IKaiserl Il958l ) 
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and the promax method (jHendrickson and Whitdll964l ). are utihzed to find sparse factor 
loadings. However, it is well known that the maximum likelihood m ethod often yields 
unstable estimates because of overparametrization (e.g., 



Akaike 



19871 ). In particular, the 



commonly-used algorit 



ims for maximum l i kelihood factor analysis (e. g., iJoreskog 



Jennrich and Robinson 



1967; 



1969 



Clarke 



197o[ Lawley and Maxwell 1971) cannot often be 



applied when the number of variables is larger than the number of observations. Further- 
more, the rotation techniques cannot often produce a sufficiently sparse solution. In order 
to overcome these difficulties, we appl y a penalized lik elihood procedure that produces 



the sparse solutions, such as the lasso (ITibshirani 



199 



The lass o-type penalized like li hood facto r analysis h a s been recently studied by several 



researchers. 



Ning and Georgiou (iml) and 



Choi et al. 



(1201 ll ) applied the weighted lasso 
to obtain sparse factor loadings, and numerically demonstrated that the penalization 
method often outperfo r med t he rotation technique with maximum likelihood procedure. 



Hirose and Yamamotd (120121 ) showed that the penalization method is a generalization 



of the rotation technique with maximum likelihood method, and applied the nonconvex 



penalties such as minimax concave penalty (MC-|- 
absolute deviation (SCAD, 



Zhan 



Fan and Li 



SB) 



and smoothly clipped 
20011 ) to achieve sparser solutions than the lasso. 



In these studies, the common factors are assumed to be uncorrelated (orthogonal) as is 
the case with the maximum likelihood exploratory factor analysis. In some cases, however, 
analysts may prefer to relax the requirement that the common factors are orthogonal (e.g.. 



Mulaik 



19721 ). Moreover, we found that the lasso- type penalization technique based on 



the orthogonal model can often estimate a completely different model from that with the 
true factor structure when the common factors are correlated (oblique). Empirically, the 
estimated factor loadings in the first column often become dense (i.e., all elements are 
non-zero), even if the first column of true loading matrix is sparse. 

In order to handle this fundamental problem, we propose to incorporate a factor 
correlation into the model, and estimate the factor correlation along with parameters 
included in the orthogonal model b y maximum penalized like lihood procedure. A pathwise 
algorithm via the E M algorithm (iRubin and Thaverl Il982l ) with coordinate descent for 



nonconv ex penalties (iMazumder et a 



20111 1 is introduced according to the basic idea 



given by iHirose and Yamamotd (120121 ). Our algorithm produces the entire solution path 
for a wide variety of convex and nonconvex penalties including the lasso, SCAD, and MC-|- 
family. Furthermore, the proposed methodology can provide sparser solutions than the 
rotation technique with maximum likelihood method, and be applied to the data where 
the number of variables is larger than the number of observations. 

The remainder of this paper is organized as follows: Section 2 shows that the lasso- 
type penalized likelihood factor analysis based on the orthogonal model cannot often 
approximate the oblique structure. In Section 3, we introduce a penalized factor analysis 



2 



via the oblique model, and provide a computational algorithm based on the EM algorithm 
and coordinate descent to obtain the entire solution path. Section 4 presents numerical 
results for both artificial and real datasets. Some concluding remarks are given in Section 
5. 

2 Penalized likelihood factor analysis based on the 
orthogonal model may not approximate the oblique 
structure 

2.1 Model and Estimation 

We briefiy d escribe a lasso-type penalized likelihood factor analysis based on the or - 



thogonal model fIChoi et al.ll201ll : iNing and Georgiou 



2011 



Hirose and Yamamoto 



2OI2I). 



Suppose that X = {Xi, . . . , Xp)'^ is a p-dimensional observable random vec tor with mean 



vector ^ and variance-covariance matrix S. The factor analysis model (e.g.. lMulaiklll972l ) 
is 

X = fi + AF + £, 

where A = {Xij) is a p x m matrix of factor loadings, and F = {Fi,--- , -Fm)^ and 
£ = (ei, ■ ■ ■ ,ep)'^ are unobservable random vectors. The elements of F and e are called 
common factors and unique factors, respectively. It is assumed that the common factors F 
and the unique factors e are multivariate-normally distributed with E{F) = 0, E{e) = 0, 
E{FF^) = Im, E{ee^) = and are independent (i.e., E{Fe^) = O). Here is the 
mxm identity matrix, and ^ is a. pxp diagonal matrix with the i-th diagonal element ipi, 
which is called unique variance. Under these assumptions, the observable random vector 
X is multivariate-normally distributed with variance-covariance matrix AA-^ + ^. 

Let Xi, - ■ ■ , a; TV be a random sample of observations from the p-dimensional normal 
population Np^fi, AA'^ + The estimates of factor loadings and unique variances, say, 
Aort and ^'ort ("ort" is an abbreviation for orthogonal), are obtained by maximizing the 
penalized log-likelihood function 

(Aort, *ort) = argmaxC*(A, 

A,* ^ 

where £°''*(A, ^) is the penalized log-likelihood function 

p m 

i=i j=i 

Here £°''*(A, ^) is the log-likelihood function 

r\A, *) = -y blog(27r) + log |AA'^ + *| + tr{(AA^ + *)-^S}] , 
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P(-) is a penalty function, and p is a regularization parameter. The matrix S = (sij) is 
the sample variance-covariance matrix. 

The lasso-type penalty function P(-) produces sparse solutions for some p, i.e., some of 
the factor loadings can be estimated by exactly zero . The lasso is continuous and f a st, but 



biase d and then estimates an overly dense model (jZou 



2006; 



Zhao and Yu 



200 



Zhang 



2OIOI ). Typically, a nonco ncave penahzation procedure such as MC-|- ( lZhangll2010l ) and 



SCAD fiFan and Li 



200 ll ) can achieve sparser models than the lasso. For example, the 



MC+ flZhang|l2010f ) is given by 
pP{\e\;p;^)=p 

= P 



X 

PI 



dx 



2p7 



/(l^l<P7) + %^/(l^l>P7). 



For each value of p > 0, 7 — ?■ 00 yields soft threshold operator (i.e., lasso penalty) and 
7 — 7- 1+ produces hard threshold operator. 



2.2 Problem of the lasso via orthogonal model 

The lasso-type penalization based on the orthogonal model can perform well when the 
true common factors are uncorrelated. In practical situations, however, the true common 
factors may often be correlated: ElFF"^] = $ with $ being the factor correlation. In 
this case, the covariance matrix of the observed variables X is expressed as A$A-^ + 
If each factor is highly correlated to each other, i.e., the absolute values non-diagonal 
elements of $ are large, the lasso based on the orthogonal model can often estimate a 
completely different model from that with the true factor structure. A typical example 
of this phenomena is given as follows: 

Example 2.1. Suppose that the true factor loadings, unique variances and factor corre- 
lation are given by 

/ 0.9 0.9 0.9 0.0 0.0 0.0 ^.nioT * - \ 
V 0.0 0.0 0.0 0.9 0.9 0.9 J ' ^ V 0.6 1.0 J ' 

We generated 50 observations from X ~ A^6(0, A$A"^ + The model was estimated by 
the penalized likelihood method with P{6) = 9 (i.e., the lasso) and p = 0.01. The estimated 
factor landings were 

^ _ / 0.87 0.83 0.89 0.59 0.53 0.54 \^ 

~ V 0-00 0-05 -0.02 0.71 0.62 0.62 J ' ^ ^ 

Because the lasso tends to produce some of the loadings being zero, A12 , A22 and A32 were 
close to or exactly zero. However, A41, A51 and Aei were far from zero although the true 
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parameters are zero. The lasso based on the orthogonal model was not able to approximate 
the true factor structure. 

The problem that the lasso based on orthogonal model cannot approximate the true 
factor structure is closely related to the rotation problem. In orthogonal model, the 
true covariance matrix A$A-^ + ^' is estimated by Aort-A^j.^ + ^'ort, which means Aort 
approximates AG, where G = {gu') is an m x m matrix that satisfies GG"^ = The 
matrix G is not an identity matrix unless the factor correlation $ is an identity matrix, 
so that Aort is not always close to A even if Aort ~ AG. 

Furthermore, the matrix G can have a rotational indeterminacy, since GG"'" = G*(G*)^ = 
$, where G* = GT with T being an arbitrary orthogonal matrix. The rotational inde- 
terminacy of G leads to i{AG, ^) = i{AG*, Thus, if Aort and ^'ort are expressed as 
Aort = AG and ^'ort = the matrix G is obtained by solving the following problem: 

max£™*(AG,*) s.t., GG^ = 

G 

which is equivalent to 

p m 

min^^P(|A.,|) s.t., GG^ = $, (2) 

i=i j=i 

where Xij is the {i,j)-th element of AG. 

How is the matrix G estimated? To explain this, we assume that the true factor 
loadings A possess perfect simple structure, that is, each row has at most one nonzero 
element. The problem in ([2]) is then written as 

m m 

minJ^^PKH^n'l) s.t., GG^ = $, (3) 

i=l i'=l 

where Wui are positive values. Because the objective function is based on the Li loss, 
some of the elements of G become exactly zero. Empirically, one of the elements of G 
often becomes 1. When ggr = 1, we have ggi' = (i' 7^ r) and gir = 4>ir {i 7^ q), so 
that all elements of r-th column of AG become non-zero unless = 0, which does not 
approximate the perfect simple structure. In this way, the lasso-type penalization via 
orthogonal model can often estimate a completely different model from that with the true 
factor structure when the common factors are highly correlated. 

Example 2.2. In Example \2.1\ the problem in ^ is written as 

niin(|5(ii| + l^ial + 1521! + 1522!) s.t., GG"^ = 
The solution of G is given by 

_ fl.O 0.0\ 



f 1.0 0.6 \ 
V 0.6 1.0 J ■ 
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In this case, we have 

/ 0.90 0.90 0.90 0.54 0.54 0.54 V 
~ V 0.00 0.00 0.00 0.72 0.72 0.72 J ' 

which is quite similar to the maximum penalized likelihood estimates of factor loadings 
based on orthogonal model in (Qp. 

3 Estimation of oblique structure via penalized like- 
lihood factor analysis 

The lasso-type penalized likelihood factor analysis based on the orthogonal model 
cannot often approximate the oblique structure as described in the Section 12. 2[ In this 
Section, we propose to incorporate a factor correlation into the model, and estimate the 
oblique model by maximum penalized likelihood procedure. 



3.1 Model Estimation 

Let -^(A, $) be the log-likelihood function based on the oblique model 

£(A, $) = -y [plog(27r) + log |A$A^ + *| + tr{(A$A^ + *)"^S}] , (4) 
and ^p(A, be the penalized log-likelihood function 

p m 

i=i j=i 

We estimate the factor loadings, unique variance, and factor correlation, say, Aobi, ^obi, 
and <&obi ("obi" is an abbreviation for oblique), by maximum penalized likelihood proce- 
dure simultaneously: 

(Aobi, *obi, ^obi) = arg max ip{A, 

Example 3.1. The lasso based on oblique model was applied to the dataset used in the 
Example \2.1\ When p = 0.01, the estimates of factor loadings were 

r. _ f 0.85 0.78 0.89 0.00 0.00 0.02 V 
~ V 0.03 0.09 0.00 0.93 0.82 0.81 ) ' 

which closely approximate the true factor loadings compared with orthogonal factor load- 
ings Aort in (Oy. 
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3.2 Algorithm 



It is well known that the solutions estimated by the lasso-type regularization methods 
are not usually expressed in a closed form mainly because the penalty term includes a non- 
differentiable function. In regression analysis, a number of researchers have proposed fast 
algorithms to obtain the entir e solutions (e.g.. Leas t angle regression 
Coor dinate descent algorithm. 



Efron et al. 



Friedman et al. 



2004; 



20071 : Generalized path seeking, Friedman 



2OI2I ). In particular, the coord inate descent algorithm is known as a remarkably fast 



algorithm ( Friedman et al. 201o[) and can also be applied to a wide va riety of convex and 



nonconvex penalties (IBreheny and Huang 



201 ll: 



Mazumder et al. 



2OIII ). Thus, we employ 



the coordinate descent algorithm to obtain the entire solution. 

In the coordinate descent algorithm, each step is fast if an explicit formula for each 
coordinate-wise maximization is given, whereas the log-likelihood function in (j4]) may 



not lead to the explicit formula. 



EM algorithm (IRubin and Thayer 



n or der to derive the explicit formula, we apply the 



19821 ) to the penalized likelihood factor analysis. The 



coordinate descent algorithm is utilized to maximize the nonconcave function in the max- 
imization step of the EM algorithm. Because the complete-data log-likelihood function 
takes the quadratic form, the explicit formula for each coordinate-wise maximization is 
available. 



3.2.1 Update Equation for Fixed Regularization Parameter 

First, we provide the update equations of factor loadings, unique variances, and factor 
correlation when p and 7 are fixed. Suppose that Aqm, ^'oW and $oid are the current 
values of factor loadings, unique variances, and factor correlation. The model can be 
estimated by maximizing the expectation of the complete-data penalized log-likelihood 
function with respect to A, ^ and 

i?[/,^(A, *,$)] = -- J^logV^,--^^ . ^ . ^ 



p m 



-y log|$| - ytr(#-iA) -iV}_^^pP(|A,,-|) + const.. 



=1 j=i 



(6) 



where b, = M~^\l,^<ff-ls, and A = M^^ + M~'\l,^<ff^,lS<if^,l\,,M~'- Here M = 
-^oid^oid-^oid + ^oid' ^"^^ columu vector of S. The derivation of the complete- 

data penalized log-likelihood function is described in Appendix. 

The new parameter (Anew, ^ncw, ^ncw) can be computed by maximizing the complete- 
data penalized log-likelihood function, i.e., 

(Anew, *new, $new) = arg max E[l^ (A, $)]. (7) 
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The solution in ([7]) is not usually expressed in a closed form because the penalty term 
includes a nondifferentiable function, so that the coordinate descent algorithm is utilized. 

Let Xy be an (m — l)-dimensional vector (Aji, Xi2, • • • , Aj(j_i), Aj(j_|_i), . . . , Xim)'^ ■ The 
parameter Ajj can be updated by maximizing ([6]) with the other parameters ^ and 
$ being fixed, i.e., we solve the following problem: 

Xij = arg min I a.jj\% - 2 I 6^^ - ^ a^^ A^fc ) A, 



arg min 



~ \ 2 




^33 I ^33 




This is equivalent to minimizing the following penalized squared-error loss function 

S{e) = argminM(^-^)2 + p*P(|^|) 

9 \ Z 

The solution S{6) can be expressed in a closed form for a variety of convex and nonconvex 
penalties. For example, the update equation for MC+ penalty is given by 

s{e) = \ — — if|^l<P7 

[ 9 if 1^1 > p*7. 

After updating A by the coordinate descent algorithm, the new values of ^'new and 
$new are obtained by maximizing the expected penalized log-likelihood function in ([6]) as 
follows: 

{^i)ncw = Sii - 2(Af)ncwbi + (Ai)^g„A(Ai)new for Z = 1, . . . , p, 

$new = argmin{log |$| + tr($"^A)}, 

where ('i^j)new is the i-th diagonal element of ^ncw and (Aj)ncw is the i-th column of Anew 
The explicit formula of $new niay not be easily derived, because the diagonal elements 
of $ are fixed by 1. Therefore, the non-diagonal elements of $new are estimated by 
Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization procedure. 

3.2.2 Pathwise Algorithm 



A pathwise algorithm for orthogonal case has been proposed by 



Hirose and Yamamoto 



(120121 ). and we apply their algorithm to the oblique case. The pathwise algorithm can 
produce the solution for the grid of increasing p values P = {pi,...,px} and a grid 
of increasing values F = {71, . . . ,7^} efficiently, where 7t gives the lasso penalty (e.g., 
7t = 00 for MC+ family). First, we compute the lasso solution path for P = {pi, . . . , px} 
by decreasing the sequence of values for p, starting with the largest value p = px for 
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which the estimates of factor loadings A = O. Next, the value of 7t-i is selected, and the 
solutions are produced for the sequence of P = {pi, . . . ,Pk}- The solution at {'Jt-i, Pk) 



can be computed by usin g the solution at 



objective value surfaces (IMazumder et al 



7r, P h), which leads to improved and smoother 



201ll ). In the same way, for t = T — 2, . . . , 1, 



the solution at {'jt, Pk) can be computed by using the solution at (7^+1, pfc)- 

3.3 Selection of the Regular izat ion Parameter 

In this modeling procedure, it is important to select the appropriate value of the 
regularization parameter p. The following two selection procedures are introduced. 

3.3.1 Model Selection Criteria 

The selection of the regularization parameter can be viewed as a model sel ection and 



evaluation problem. In regression analysis, the degrees of freedom of the lasso (jZou et al. 



20071 ) may be used for selecting the regularization parameter. With the use of the degrees 



of freedom, the following model selection criteria are introduced: 

AIC = -2^(A, $) + 2p* 
BIG = -2£(A, $) + p* log N, 
CAIC = -2£(A,*,$) +p*(logiV + 1), 

where the number of parameters is given by p* = df{pk) + mo(mo — l)/2 + p. Here df{pk) 
is the number of nonzero parameters for the lasso penalty at p = p^, mo(mo — l)/2 is 
the number of parameters in factor correlation matrix and p is the number of parameters 
in unique variances. Note that this form ula can be applied to any value of 7 if the 



reparameterization of the penalty function (IMazumder et al.ll201ll ) is carried out, because 



the reparameterization constrains the degrees of freedom to be constant as 7 varies. 

3.3.2 Goodness-of-Fit Index 

It may be easy to interpret the estimated model when the factor loadings are suffi- 
ciently sparse. However, a model that is too sparse does not fit the data. Therefore, it 
is reasonable to select a regularization parameter that produces sparse solutions and also 
yields large values for the following goodness-of-fit index (GFI) and the adjusted GFI 
(AGFI): 



AGFI = 1 



tr[{S-iS}2] 
p(p+l)(l-GFI) 
pip+l)-2df 
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where S = A$A-^ + ^. The GFI and AGFI take values from through 1. In our 
experience, the model is fitted well if the value of the GFI is greater than 0.9. 



3.4 Treatment for Improper Solutions 



It is well-known that the maximum likelihood estimates of unique variances can turn 
out to be zero or negative, which is refer red as the improper solutions, and many re- 
searchers h ave studied this problem (e.g.. 



Kan 



3 



Van Driel 



1978 



Anderson and Gerbing 



1984 



19981 ). In general, the occurrence of improper solutions makes converge of the algo- 

e this issue, we add a penalty with respect to ^ 



rithm slow and unstable. In order to hand! 
to (El) according to the basic idea given by 
koil\ \: 



Martin and McDonaldl ( 1l975l ) and 



Hirose et al. 



(A,*,*) 



N 



(A, $) r/tr(*-^/2g^-i 



where 77 is a tuning parameter. Note that when ipi — )■ 0, tr(^' ^/^S^' ^/^) — 00. Thus, the 
penalt y term tr(^'~^/^S^'~^/^) prevents the occurrence of impr oper solutions. 



Hirose et al. 



( 2OI1I ) derived a generalized Bayesian information criterion f Konishi et al. 2004) for se- 
lecting the appropriate value of r], whereas it is difficult to derive generalized Bayesian 
model criterion in lasso-type penalization procedure. In practice, the penalty term can 
prevent the occurrence of improper solution even when rj is very sma ll such as 0.001. 

We provide a package fane in R flR Development Core Teamll2010l ). which implements 
our algorithm to produce the entire solution path. The package fane is available from 
Comprehensive R Archive Network (GRAN) at http : //cran. r-project . org/web/packages/f anc/index . 



4 Numerical Examples 

4.1 Monte Carlo Simulations 

In the simulation study, we used three models according to the following factor load- 
ings: 

Model (A): 

/ 0.9 0.9 0.9 0.0 0.0 0.0 \^ 
~ V 0-0 0-0 0-0 0-8 0.8 0.8 J ' 

Model (B): 



0.9 


0.9 


0.9 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.8 


0.8 


0.8 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.7 


0.7 


0.7 



Model (C): 
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A 



V 



0.9 • I25 


O25 


O25 


025 


O25 


0.8 ■ I25 


O25 


025 


O25 


O25 


0.7-125 


025 


O25 


O25 


025 


0.6-125 



where Ig is a g-dimensional vector with each element being 1, and 0^ is a g-dimensional 
zero vector. For all models, we set $ = 0.4 - + 0.6 - Imlm' ^ — diag(Im — A$A^). 
The Model (C) is a relatively large model compared with Models (A) and (B). 

For each model, 1000 data sets were generated with x ~ A'p(0, A$A"^ + The 
number of observations was = 50, 100, and 200. The model was estimated by the max- 
imum penalized likelihood method based on both orthogonal model and oblique model. 
The penalty functions were the MC+ family with 7 = 2.10 and the lasso, and the regu- 
larization parameter was selected by the AIC, BIG and CAIC. For comparison, we also 
estimated the model by the maximum likelihood method, and employed the rotation 
techniques based on the following criteria: the lasso loss criterion (both orthogonal and 
oblique models), the varimax criterion (orthogonal model) and promax criterion (oblique 
model). For example, the lasso loss criterion for oblique model is formulated as follows: 

p m 

mm^ J] |A^|, s.t. A* = AmleT, diag(T^T) = I, 

i=i j=i 

where A* = (A* ) and Amle is the maximum likelihood estimates of factor loadi ngs. Note 



that the la sso loss function is included in the class of component loss function ( iJennrich 



2004 



20061 ) 



Tables [T], |2] and [3] show the mean squared error of factor loadings, the true positive 
rate (TPR), and true negative rate (TNR). The mean squared error is defined by MSE = 
^^^'^i'' ||A — A'''^^|p/(1000pm), where A^*-* is the estimated factor loading for the s-th 
dataset. The TPR (TNR) indicates the proportion of cases where non-zero (zero) factor 
loadings correctly set to non-zero (zero). Note that the maximum likelihood estimates 
are not available when N < p, so that the results of rotation techniques based on the 
maximum likelihood estimates for N = 50 and N = 100 in Model (C) were not displayed. 
We can see that 

• The lasso-type regularization with orthogonal model yielded large MSE and small 
TNR even when the number of observations was sufficiently large, which suggests 
the orthogonal model may produce different factor structure from the true one. 

• For Model (C), although the maximum likelihood estimates were not available when 
A = 50 and N = 100, the penalized likelihood procedure via BIG and CAIC with 
MC+ relatively selected correct models. 
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Table 1: Mean squared error, the true positive rate (TPR), and true negative rate (TNR) 
for Model (A). In the second column, "obi" and "ort" indicate the oblique model and 
orthogonal model, respectively. In the last column, "P /V" indicates the promax rotation 
for oblique case, and the varimax rotation for orthogonal case. 

Penalization Rotation 
AIC BIG CMC — — 



N 






MC+ 


lasso 


MC+ 


lasso 


MC+ 


lasso 


lasso 


P/V 


50 


obi 


MSE 


0.14 


0.18 


0.14 


0.20 


0.16 


0.24 


0.17 


0.14 






TPR 


1.00 


1.00 


1.00 


1.00 


0.99 


1.00 


1.00 


1.00 






TNR 


0.75 


0.47 


0.84 


0.55 


0.86 


0.57 


0.05 


0.00 




ort 


MSE 


1.21 


1.13 


1.21 


1.08 


1.21 


1.02 


1.27 


0.53 






TPR 


0.97 


0.98 


0.97 


0.98 


0.96 


0.98 


0.98 


1.00 






TNR 


0.30 


0.20 


0.33 


0.23 


0.35 


0.26 


0.14 


0.00 


100 


obi 


MSE 


0.06 


0.09 


0.04 


0.10 


0.03 


0.11 


0.08 


0.06 






TPR 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 






TNR 


0.81 


0.46 


0.91 


0.56 


0.93 


0.58 


0.06 


0.00 




ort 


MSE 


1.03 


0.95 


1.05 


0.90 


1.05 


0.87 


1.04 


0.48 






TPR 


0.99 


0.99 


0.98 


0.99 


0.98 


0.99 


0.99 


1.00 






TNR 


0.34 


0.21 


0.35 


0.24 


0.36 


0.25 


0.14 


0.00 


200 


obi 


MSE 


0.02 


0.05 


0.01 


0.06 


0.01 


0.06 


0.04 


0.03 






TPR 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 






TNR 


0.87 


0.47 


0.97 


0.56 


0.97 


0.60 


0.07 


0.00 




ort 


MSE 


0.89 


0.84 


0.89 


0.78 


0.89 


0.77 


0.88 


0.46 






TPR 


0.99 


1.00 


0.99 


1.00 


0.99 


1.00 


1.00 


1.00 






TNR 


0.41 


0.21 


0.42 


0.26 


0.43 


0.26 


0.14 


0.00 



• The MC+ family often performed better than the lasso in terms of both the MSE 
and model consistency. 

• The BIC and CAIC may often select the correct model compared with the AIC. 



4.2 Analysis of Harman's psychological tests data 



We illustrate the proposed procedure by Harman's psychological tests data (IHarman 



19761 ). This data represents scores of = 145 subjects on the 24 psychological tests. 



The dataset is available from the datasets in the software R (IR Development Core Team 



2010l ). Table m shows the factor loadings estimated by MC+ based on both orthogonal 



and oblique models at 7 = 2.10. The value of p was selected by the BIC. With the MC+ 
based on the orthogonal model, all elements of the first column of the estimated factor 
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Table 2: Mean squared error, the true positive rate (TPR), and true negative rate (TNR) 
for Model (B). In the second column, "obi" and "ort" indicate the oblique model and 
orthogonal model, respectively. In the last column, "P /V" indicates the promax rotation 
for oblique case, and the varimax rotation for orthogonal case. 

Penalization Rotation 
AIC BIG CMC — — 



N 






MC+ 


lasso 


MC+ 


lasso 


MC+ 


lasso 


lasso 


P/V 


50 


obi 


MSE 


0.84 


0.81 


1.00 


1.21 


1.08 


1.41 


0.81 


0.72 






TPR 


0.97 


0.96 


0.91 


0.86 


0.88 


0.82 


1.00 


1.00 






TNR 


0.69 


0.47 


0.81 


0.56 


0.85 


0.59 


0.02 


0.00 




ort 


MSE 


2.56 


2.03 


2.66 


1.95 


2.66 


2.05 


2.33 


1.33 






TPR 


0.94 


0.97 


0.91 


0.90 


0.89 


0.84 


0.99 


1.00 






TNR 


0.35 


0.25 


0.40 


0.35 


0.45 


0.40 


0.14 


0.00 


100 


obi 


MSE 


0.29 


0.34 


0.44 


0.62 


0.53 


0.83 


0.41 


0.31 






TPR 


1.00 


1.00 


0.96 


0.94 


0.94 


0.90 


1.00 


1.00 






TNR 


0.77 


0.47 


0.88 


0.57 


0.90 


0.59 


0.03 


0.00 




ort 


MSE 


2.35 


2.02 


2.43 


1.80 


2.43 


1.76 


2.24 


1.13 






TPR 


0.96 


0.98 


0.94 


0.96 


0.94 


0.94 


0.99 


1.00 






TNR 


0.38 


0.23 


0.42 


0.30 


0.44 


0.34 


0.16 


0.00 


200 


obi 


MSE 


0.10 


0.16 


0.07 


0.18 


0.08 


0.22 


0.19 


0.12 






TPR 


1.00 


1.00 


1.00 


1.00 


0.99 


0.99 


1.00 


1.00 






TNR 


0.86 


0.48 


0.95 


0.56 


0.96 


0.58 


0.03 


0.00 




ort 


MSE 


2.05 


1.85 


2.11 


1.64 


2.13 


1.59 


1.95 


1.02 






TPR 


0.98 


0.99 


0.97 


0.99 


0.97 


0.99 


0.99 


1.00 






TNR 


0.41 


0.21 


0.44 


0.27 


0.45 


0.28 


0.16 


0.00 



loadings were relatively large. This phenomena has been described in Section 12.21 On the 
other hand, the MC+ based on the oblique model estimated a loading matrix where the 
first column was sparse. The AGFI and GFI of the oblique model were 0.78 and 0.87, 
respectively, which might be large enough to conclude that the estimated model fit the 
observed data. 

5 Concluding remarks 

In exploratory factor analysis, the lasso based on the orthogonal model often fails in 
approximating the oblique structure. We have shown that this disadvantage comes from 
the rotation problem of factor loadings. Then, a maximum penalized likelihood factor 
analysis based on the oblique model has been proposed to handle this problem. Our 
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Table 3: Mean squared error, the true positive rate (TPR), and true negative rate (TNR) 
for Model (C). In the second column, "obi" and "ort" indicate the oblique model and 
orthogonal model, respectively. In the last column, "P /V" indicates the promax rotation 
for oblique case, and the varimax rotation for orthogonal case. 

Penalization Rotation 
AIC BIG CMC — — 



N 






MC+ 


lasso 


MC+ 


lasso 


MC+ 


lasso 


lasso 


P/V 


50 


obi 


MSE 


8.54 


10.0 


7.68 


17.2 


8.84 


20.6 


— 


— 






TPR 


0.99 


1.00 


0.92 


0.97 


0.86 


0.92 


— 


— 






TNR 


0.43 


0.14 


0.85 


0.36 


0.96 


0.44 


— 


— 




ort 


MSE 


28.0 


15.3 


27.4 


15.5 


22.9 


20.3 


— 


— 






TPR 


0.98 


1.00 


0.97 


0.99 


0.94 


0.92 


— 


— 






TNR 


0.26 


0.06 


0.32 


0.20 


0.51 


0.36 






100 


obi 


MSE 


3.51 


5.73 


1.79 


12.4 


2.08 


13.2 










TPR 


1.00 


1.00 


0.99 


1.00 


0.98 


0.99 










TNR 


0.58 


0.14 


0.99 


0.41 


0.99 


0.43 








ort 


MSE 


29.5 


15.5 


23.0 


11.9 


13.8 


12.6 










TPR 


0.98 


1.00 


0.98 


1.00 


0.97 


0.99 










TNR 


0.32 


0.04 


0.51 


0.13 


0.74 


0.22 






200 


obi 


MSE 


0.97 


3.39 


0.67 


9.24 


0.67 


10.2 


2.18 


1.68 






TPR 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 






TNR 


0.91 


0.13 


1.00 


0.44 


1.00 


0.48 


0.00 


0.00 




ort 


MSE 


30.2 


16.1 


19.2 


12.1 


9.12 


11.1 


18.0 


13.0 






TPR 


0.99 


1.00 


0.99 


1.00 


0.99 


1.00 


1.00 


1.00 






TNR 


0.41 


0.03 


0.65 


0.07 


0.86 


0.09 


0.02 


0.00 



modeling strategy has been investigated through Monte Carlo simulations and the analysis 
of a real data. Simulation results show that the proposed procedure can yield much 
smaller mean squared error and true negative rate compared with the penalized likelihood 
factor analysis via the orthogonal model. Furthermore, the MC+ often produced sparser 
solutions than the lasso, so that the true factor structure can often be reconstructed. In 
the Harman's psychological data example, the orthogonal model estimated factor loadings 
where first column was dense, whereas our procedure produced sparse factor loadings. 

As a future research topic, it would be interesting to construct a penaliz ation procedure 

via n onconvex penalties for structural equation modeling, such as LISREL fjjoreskog and Sorbom 



19961 ). which is able to express much more complex covariance structure between observ- 



able variables and common factors. In this paper, the tuning parameter was selected by 
the information criteria based on the degrees of freedom of the lasso. The degrees of free- 
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Table 4: Loading matrices estimated by MC+ based on both orthogonal and oblique 
models at 7 = 2.10 for 24 psychological tests data. The value of p was selected by the 
BIC. 



MC+ (Orthogonal) MC+ (Obhque) 



0.75 


-0.08 


0.00 


0.00 


0.73 


0.00 


0.00 


0.00 


0.46 


0.00 


0.00 


0.00 


0.47 


0.00 


0.00 


0.00 


0.56 


0.00 


0.16 


0.00 


0.66 


0.00 


-0.18 


0.00 


0.59 


0.00 


0.00 


0.00 


0.58 


0.00 


0.00 


0.00 


0.47 


0.63 


-0.17 


0.00 


0.00 


0.72 


0.22 


0.00 


0.49 


0.67 


0.00 


0.00 


0.00 


0.75 


0.00 


0.17 


0.48 


0.67 


-0.12 


-0.12 


0.00 


0.79 


0.13 


0.00 


0.56 


0.41 


-0.16 


0.00 


0.23 


0.48 


0.19 


0.00 


0.49 


0.71 


0.00 


0.00 


0.00 


0.81 


0.00 


0.13 


0.17 


0.14 


-0.81 


0.24 


-0.31 


0.00 


0.93 


0.11 


0.36 


0.14 


-0.42 


0.34 


0.00 


0.00 


0.44 


0.35 


0.37 


-0.10 


-0.63 


0.13 


0.14 


-0.19 


0.71 


0.00 


0.59 


0.00 


-0.41 


0.00 


0.37 


0.00 


0.44 


0.00 


O.ZO 


O.ZO 


0.00 


0.4o 


U.UU 


U.UU 


n nn 
U.UU 




0.26 


0.15 


0.00 


0.45 


0.00 


0.00 


0.00 


0.53 


0.51 


0.00 


0.10 


0.42 


0.44 


-0.14 


-0.15 


0.49 


0.26 


0.19 


-0.11 


0.54 


0.00 


0.00 


0.00 


0.65 


0.43 


0.00 


-0.19 


0.43 


0.28 


-0.20 


0.18 


0.43 


0.38 


0.06 


0.00 


0.30 


0.22 


0.00 


0.00 


0.35 


0.56 


0.25 


0.00 


0.16 


0.37 


0.24 


0.00 


0.17 


0.55 


0.00 


-0.31 


0.16 


0.38 


0.00 


0.39 


0.00 


0.56 


0.24 


0.00 


0.16 


0.37 


0.23 


0.00 


0.18 


0.67 


0.19 


-0.10 


0.10 


0.50 


0.22 


0.15 


0.00 


0.43 


0.29 


-0.41 


0.24 


0.00 


0.22 


0.47 


0.22 



dom of the lasso are usually applied to the regression model, whereas we have not given 
a mathematical support for the degrees of freedom of the lasso in factor analysis model 
yet. Another interesting topic is to provide a theoretical justification of the information 
criteria given by Section 13.3.11 
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Appendix: Derivation of complete-data penalized log- 
likelihood function in EM algorithm 

In order to apply the EM algorithm, first, the common factors /„ can be regarded as 
missing data and maximize the complete-data penalized log-likelihood function 

p m 

/J^(A,*,$) = 5^1og/(a.„,/„)-iV5^5^pP(|A,,|), 

n=l 1=1 j=l 

where the density function /(a;„, fn) is defined by 

/(.„,/„) = (2.)-/^|*rV^exp |_ (^--A/„f-P-(x„-A/,.) | 



i=l 



n (2vrV^.)-'/'exp 



2^ 



Then, the expectation of 1^ can be taken with respect to the distributions A, ^, $), 

E[l^{A, $)] = _!^iP±^ iog(2.) log^. 

i=l 



1 ^ 



n=l 1=1 ^* 



M 1 f ^ -\ P m 

-- log |$| - E ^"^^[i^^i^Jl^n] - E E ^^(1 

ln=l J i=l j=l 



A., I) 



For given Aqm, *oid and $oid, the posterior /(/„|a;„, Aqm, *oid, $oid) is normally dis- 
tributed with E [F„ | = \l,^<ff-,lXn and E[FnF^\Xr,] = M~^ + E[Fn\Xn]E[Fn\Xnf, 

where M = A^i^^^i^^oid + $oid- Then, we have 



N N 



Y,E[Fn]xn. = EM^'Ajid*o">„x„, = iVM^iAjid^oldS. 



n=l n=l 
N N 



Y,E[FnF^] = E(M"' + M-^Ajid*old^n^n*oldAoldM-l) 

n=l 

iV(M-i + M-iA^i,*-iS*-i>oidM-i), 



n=l n=l 



Let M-iA;^id*-i^Si and M-^ + M-iA^id*-^S*-i;^AoidM-i be and A, respectively. 
Then, the expectation of 1^' in can be derived. 
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